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Abstract 

Dynamic Bayesian network (DBN) is more robust than 
normal Bayesian network (BN) for modeling users' 
knowledge when it allows monitoring user's process of 
gaining knowledge and evaluating her/his knowledge. 
However the size of DBN becomes numerous when the 
process continues for a long time; thus, performing 
probabilistic inference will be inefficient. Moreover the 
number of transition dependencies among points in time is 
too large to compute posterior marginal probabilities when 
doing inference in DBN. To overcome these difficulties, we 
propose the new algorithm that both the size of DBN and the 
number of Conditional Probability Tables (CPT) in DBN are 
kept intact (not changed) when the process continues for a 
long time. This method includes six steps: initializing DBN, 
specifying transition weights, re-constructing DBN, 
normalizing weights of dependencies, re-defining CPT(s) 
and probabilistic inference. Our algorithm also solves the 
problem of temporary slip and lucky guess: "learner does 
(doesn't) know a particular subject but there is solid 
evidence convincing that she/he doesn't (does) understand it; 
this evidence just reflects a temporary slip (or lucky guess)". 

Keywords 

Dynamic Bayesian Network 

Introduction 

User model is the representation of information about 
an individual that is essential for an adaptive system 
to provide the adaptation effect, i.e., to behave 
differently for different users. User model must 
contain important information about user such as: 
domain knowledge, learning performance, interests, 
preference, goal, tasks, background, personal traits 


(learning style, aptitude...), environment (context of 
work) and other useful features. Such individual 
information can be divided into two categories: 
domain specific information and domain independent 
information. Knowledge being one of important user's 
features is considered domain specific information. 

Knowledge information is organized as knowledge 
model. Knowledge model has many elements (concept, 
topic, subject...) which student needs to learn. There 
are many methods to build up knowledge model such 
as: stereotype model, overlay model, differential 
model, perturbation model and plan model, which is 
the main subject in this paper. In overlay method, the 
domain is decomposed into a set of knowledge 
elements and the overlay model (namely, user model) 
is simply a set of masteries over those elements. The 
combination between overlay model and BN is done 
through following steps: 

The structure of overlay model is translated into 
BN, each user knowledge element becomes an 
variable in BN 

Each prerequisite relationship between domain 
elements in overlay model becomes a conditional 
dependence assertion signified by CPT of each 
variable in Bayesian network 

Our approach is to improve knowledge model by 
using DBN instead of BN. The reason is that there are 
some drawbacks of BN which are described in section 
2. Our method is proposed in section 3 and section 4 is 
the conclusion. 
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Dynamic Bayesian Network 

Bayesian Network 

Bayesian network (BN) is the directed acyclic graph 
(DAG) in which nodes are linked together by arcs; 
each arc expresses the dependence relationships (or 
causal relationships) between nodes. Nodes are 
referred as random variables. The strengths of 
dependences are quantified by Conditional Probability 
Table (CPT). When one variable is conditionally 
dependent on another, there is a corresponding 
probability in CPT measuring the strength of such 
dependence; in other words, each CPT represents the 
local conditional probability distribution of a variable. 
Suppose BN G={X, Pr(X)} where X and Pr(X) denote a 
set of random variables and a global joint probability 
distribution, respectively. X is defined as a random 
vector X = { xi , xi,..., Xn} whose cardinality is n. The 
subset of X so-called £ is a set of evidences, E = [ei, 
a,.. ., a) c X. Note that e% is called evidence variable or 
evidence in brief. 

E.g., in figure 1, event "cloudy" is cause of event 
"rain" or "sprinkler", which in turn is cause of "grass 
is wet". So we have three causal relationships of: 1- 
cloudy to rain, 2-rain to wet grass, 3- sprinkler to wet 
grass. This model is expressed by Bayesian network 
with four variables and three arcs corresponding to 
four events and three dependence relationships. Each 
variable which is binary variable has two possible 
values True (1) and False (0) together its CPT. 

P(C=1) P(C=0) 

0.5 0.5 



FIG. 1 BAYESIAN NETWORK (A CLASSIC EXAMPLE ABOUT 
"WET GRASS") 

Suppose we use two letters Xi and pa(xd to name a 
node and a set of its parent, correspondingly. The 
Global Joint Probability Distribution Pr(X) so-called 
GJPD is product of all local CPT (s): 

Pr(X) = Pr(xi,X 2 ,...,Xn) = f| Pr(x; I pa(xi )) (1) 

;= 1 


Note that Pr(x: I pa(xO) is the CPT of xu According to 
Bayesian rule, given E the posterior probability of 
variables x, is computed as below: 


Pr( x, I E ) = 


Pr(£ I Xj ) * Pr(x l ) 
Pr( E) 


( 2 ) 


Where Pr(xi I E) is prior probability of random variable 
Xi and Pr(E I Xi ) is conditional probability of occurring E 
when Xi was true and Pr(£) is probability of occurring 
£ together all mutually exclusive cases of X. Applying 
(1) into (2) we have: 


Pr( x t I E) = 


X Pr(x 1 ,x 2 ,...,x„) 

X/ jx,uE) 

X Pr(x 1 ,x 2 ,...,x„) 

XIE 


( 3 ) 


The posterior probability Pr(xi I £) is based on GJPD 
Pr(X). Applying (1) into BN in figure 1, we have: 

Pr(C,R,S,W) = Pr(C)*Pr(R I C)*Pr(S I C)*Pr(W I C,R,S) = 
Pr(C)*Pr(S)*Pr(R I C)*Pr(W I C,R,S) due to Pr(S I C)=Pr(S). 

There is conditional independence assertion about 
variables S and C. Suppose W becomes evidence 
variable which is observed the fact that the grass is 
wet, so, Whas value 1. There is request for answering 
the question: how to determine which cause (sprinkler 
or rain) is more possible for wet grass. Hence, we will 
calculate two posterior probabilities of S (=1) and R (=2) 
in condition W (=2). These probabilities are also called 
explanations for W. Applying (3), we have: 

XPr(C,R = U,W=l) 

Pr (R = 1 1 W = 1) = — = X X = 0.581 

X Pr(C,R,S,W = l) 0.7695 

C.R.S 


Pr (S = 1 1 W = 1) 


X Pr(C,R,S = 1,W = 1) 

C.R 


X Pr(C,R,XW = 1) 

C.R.S 


0.4725 

0.7695 


0.614 


Because the posterior probability of S: Pr(S=l I W=2) is 
larger than the posterior probability of R: Pr(R=l I W=2), 
it is concluded that sprinkler is the most likely cause of 
wet grass. 


Dynamic Bayesian Network 

BN provides a powerful inference mechanism based 
on evidences but it can not model temporal 
relationships between variables. It only represents 
DAG at a certain time point. In some situations, 
capturing the dynamic (temporal) aspect is very 
important; especially in e-learning context it is very 
necessary to monitor chronologically users' process of 
gaining knowledge. So the purpose of dynamic 
Bayesian network (DBN) to model the temporal 
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relationships among variables; in other words, it 
represents DAG in the time series. 

Suppose we have some finite number T of time points, 
let Xi[t] be the variable representing the value of x; at 
time t where 0< f < X. Let X[f] be the temporal random 
vector denoting the random vector X at time f, X[f] = 
{xi [t], X 2 [t],..., Xn[t]}. A DBN (Neapolitan 2003) is 
defined as a BN containing variables that comprise T 
variable vectors X[f] and determined by following 
specifications: 

An initial BN Go = {X[0], Pr(X[0]} at first time t = 0 

A transition BN is a template consisting of a 
transition DAG G— ► containing variables in 
X[f] u X[f+2] and a transition probability 
distribution Pr_ _ (X[f+2] I X[f]). 

In short, the DBN consists of the initial DAG Go and 
the transition DAG G— ► evaluated at time t where 
0 < t < T. The global joint probability distribution of 
DBN so-called DGJPD is product of probability 
distribution of Go and product of all Pr — ► (s) valuated 
at all time points, which is denoted following: 

Pr(X[0], X[2],..„ X[rj) = Pr(X[0])* f[Pr_. (X[f + 1] I X[t]) 

( 4 ) 

Note that the transition (temporal) probability can be 
considered the transition (temporal) dependency. 



t = 0 t = l t = 2 

FIG. 2 DBN FOR t = 0,1, 2 


Non-evidence variables are not shaded, otherwise 

evidence variables are shaded. Dash lines denotes 

transition probabilities (transition dependencies) of 
G_, between consecutive points in time. 

The essence of learning DBN is to specify the initial 
BN and the transition probability distribution Pr—*. 
According to Murphy (2002 pp. 127), it is possible to 
specify the transition probability distribution Pr_> by 
applying the scored-based approach that selects 


optimal probabilistic network according to some 
criterions. This is a backward or forward selection or 
the leaps and bounds algorithms (Hastie, Tibshirani, 
and Friedman 2001). We can use a greedy search or 
MMC algorithm to select the best output DBN. 
Friedman, Murphy and Russell (1998) propose the 
criterion BIC score and BDe score to select and learn 
DBN from complete and incomplete data. This 
approach uses the structural expectation maximization 
(SEM) algorithm that combines network structure and 
parameter into single expectation maximization (EM) 
process (Friedman, Murphy and Russell 1998). Some 
other algorithms such as Baum Welch algorithm (Mills) 
take advantages of the similarity of DBN and hidden 
Markov model (HMM) in order to learn DBN from the 
aspects of HMM when HMM is the simple case of 
DBN. In general, learning DBN is an extension of 
learning static BN and there are two main BN learning 
approaches (Neapolitan 2003): 

- Scored-based approach: given scoring criterion 5 
assigned to every BN, which BN gains highest 5 is 
the best BN. This criterion 5 is computed as the 
posterior probability over whole BN given training 
data set. 

- Constraint-based approach: given a set of 

constraints, which BN satisfies over all such 
constraints is the best BN. Constraints are defined 
as rules relating to Markov condition. 

These approaches can give the precise results with the 
best-learned DBN but they become inefficient when 
the number of variables gets huge. It is impossible to 
learn DBN by the same way done in case of static BN 
when the training data is enormous. Moreover, these 
approaches cannot response in real time if there is 
requirement of creating DBN from continuous and 
instant data stream. Following are drawbacks of 
inference in DBN and the proposal of this research. 

Drawbacks of Inferences in DBN 

Formula 4 is considered as extension of formula (1); so, 
the posterior probability of each temporal variable is 
now computed by using DGJPD in formula 4 which is 
much more complex than normal GJPD in formula 1. 
Whenever the posterior of a variable evaluated time 
point t needs to be computed, all temporal random 
vectors X[0], X[2],..., X[f] must be included for 
executing Bayesian rule because DGJPD is product of 
all transition Pr (s) valuated at t points in time. 
Suppose the initial DAG has n variables ( X[0] = {xi[ 0], 
x 2 [0],..., Xh[0]} ), there are n*(t+l) temporal variables 
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concerned in time series (0, 2, 2,..., t ). It is impossible 
to take into account such an extremely large number of 
temporal variables in X[0]uX[l] u... uX[f]. In other 
words, the size of DBN becomes numerous when the 
process continues for a long time; thus, performing 
probabilistic inference will be inefficient. 

Moreover suppose Go has n variables, we must specify 
n*n transition dependencies between variables 
Xi[t] e X[f] and variables Xi[t+1] e X[f+2]. Through t 
points times, there are n*n*t transition dependencies. 
So it is impossible to compute effectively the transition 
probability distribution Pr _> (X[t+1] I X[f]) and the 
DGJPD in (4). 

Using Dynamic Bayesian Network to Model 
User'S Knowledge 

To overcome drawbacks of DBN, we propose the new 
algorithm that both the size of DBN and the number of 
CPT(s) in DBN are kept intact (not changed) when the 
process continues for a long time. However we should 
glance over some definitions before discussing our 
method. Given pm[t+l\ is a set of parents of Xi at time 
point t+1, namely parents of X,[f+2], the transition 
probability distribution is computed as below: 

Pr^(X[t+l] I X[f])= f] Piy {x\t + 1] I pa.[t + 1] (5) 

i=l 

Applying (5) for all X and for all f, we have: 

Pr_(X[f+2] I X[0],X[2],...,X[f]) = Pr_(X[f+2] I X[f]) (6) 

If the DBN meets fully (6), it has Markov property, 
namely, given the current time point f, the conditional 
probability of next time point t+1 is only relevant to 
the current time point f, not relevant to any past time 
point (f-2, f-2,...,0). Furthermore, the DBN is stationary 
if Pr— >(X[f+2] I X[f]) is the same for all t. I propose a 
new algorithm for modeling and inferring user's 
knowledge by using DBN. 

Suppose DBN is stationary and has Markov property. 
Each time there are occurrences of evidences, DBN is 
re-constructed and the probabilistic inference is done 
by six following steps: 

Step 1: Initializing DBN 

- Step 2: Specifying transition weights 

- Step 3: Re-constructing DBN 

Step 4: Normalizing weights of dependencies 

- Step 5: Re-defining CPT (s) 

Step 6: Probabilistic inference 

Six steps are repeated whenever evidences occur. Each 
iteration gives the view of DBN at certain point in time. 


After t th iteration, the posterior marginal probability of 
random vector X in DBN will approach a certain limit; 
it means that DBN converge at that time. 

Because there are an extremely large number of 
variables included in DBN for a long time, we focus a 
subclass of DBN in which network in different time 
steps are connected only through non-evidence 
variables (x;). 

Suppose there is course in which the domain model 
has four knowledge elements xi, xi, xi, ei. The item ei is 
the evidence that tells us how learners are mastered 
over xi, X 2 , X3. This domain model is represented as a 
BN having three non-evidence variables xi, xi, X3 and 
one evidence variable ei. The weight of an arc from 
parent variable to child variable represents the 
strength of dependency among them. In other word, 
when xi and X3 are prerequisite of xi, knowing X 2 and 
X3 have causal influence in knowing xi. For instance, 
the weight of arc from X 2 to xi measures the relevant 
importance of X 2 in xi. This BN regarded as an 
example for our algorithm is showed in figure 3. 




FIG. 4 INITIAL DBN DERIVED FROM BN IN FIGURE 3 

Step 1: Initializing DBN 

If t > 0 then jumping to step 2. Otherwise, all variables 
(nodes) and dependencies (arcs) among variables of 
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initial BN Go must be specified. The strength of 
dependency is considered as weight of arc. 

Step 2: Specifying Transition Weight 

Given two factors: slip and guess where slip (guess) 
factor expresses the situation that user does (doesn't) 
know a particular subject but there is solid evidence 
convincing that she/he doesn't (does) understand it; 
this evidence just reflects a temporary slip (or lucky 
guess). Slip factor is essentially probability that user 
has known concept/ subject x before but she/he forgets 
it now. Otherwise guess factor is essentially probability 
that user hasn't known concept/subject x before but 
she/he knows it knows. Suppose x[t] and x[t+l] denote 
the user's state of knowledge about x at two 
consecutive time points ti and h respectively. Both x[t] 
and x[t+l] are temporal variables referring the same 
knowledge element x. 

slip = Pr(not x[t+l] I x[t]) 
guess = Pr(x[t+1 ] I not x[t]) 

(where 0< guess, slip < 1) 

So the conditional probability (named a) of event that 
user knows x[t+l] given event that she/he has already 
known x[t] has value 1-slip. Proof, 

a = Pr(x[t+l] I x[t]) = 1 - Pr(not x[t+l] I x[t])= 1-slip 
The bias b is defined as differences of an amount of 
knowledge user gains about x between t and t+1. 


1 + ?r(x\t + 1 ] | notxltf) 1 + guess 

Now the weight w expressing strength of dependency 
between x[t] and x[t+l] is defined as product of the 
conditional probability a and the bias b. 

w = a*b = (1- slip)* (5) 

1 + guess 

Expanding to temporal random vectors, w is 
considered as the weight of arcs from temporal vector 
X[f] to temporal vector X[t+1]. Thus the weight w 
implicates the conditional transition probability of 
X[t+1] given X[f] 

zv ss Pr-,(X[t+l] I X[f]) = PMX[f] I X[t-1]) 

So w is called temporal weight or transition weight 
and all transition dependencies have the same weight 
w. Suppose slip = 0.3 and guess = 0.2 in our example, 

wehavezt>= (1-0.3)* — - — =0.58 
1 + 0.2 



FIG. 5 TRANSITION WEIGHTS 


Step 3: Re-constructing DBN 

Because our DBN is stationary and has Markov 
property, we only focus its previous adjoining state at 
any point in time. We concern DBN at two consecutive 
time points t-1 and t. For each time point t, we create a 
new BN G'[f] whose variables include all variables in 
X[t-1 ] uX[f] except evidences in X[f-2]. G'[t] is called 
augmented BN at time point f. The set of such 
variables is denoted Y. 


Y = X[t~l] UX[f] / E[t~l] = [Xl[t~l], X2[f-1],..„ Xn[t-1\, 
xi[t], X 2 [t],..., Xn[t ]} / {ei[t-l\, e 2 [f-l],..., ek[t-l]} where 
E[t-1] is the set of evidences at time point t-1 

A very important fact to which you should pay 
attention is that all conditional dependencies among 
variables in X[t-1] are removed from G'[f]. It means 
that no arc (or CPT) in X[f-2] exists in G'[f] now. 
However each couple of variables Xi[t-1] and x,[t] has a 
transition dependency which is added to G'[f]. The 
strength of such dependency is the weight w specified 
in (5). Hence every x,[t] in X[f] has a parent which in 
turn is a variable in X[t-1] and the temporal 
relationship among them are weighted. Vector X[t-1] 
becomes the input of vector X[f]. 










FIG. 6 AUGMENTED DBN AT TIME POINT t 


Dash lines denotes transition dependencies. The 

augmented DBN is much simpler than DBN in figures 2. 
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Step 4 : Normalizing Weights of Dependencies 

Suppose xi[t] has two parents xi[t] and X3[2]. The 
weights of two arcs from xi[t], X3[t] to xi[t] are wi, ivs 
respectively. The essence of these weights is the 
strength of dependencies inside random X[t]. 

102 + 103= 1 

Now in augmented DBN, the transition weight of 
temporal arc from xi[t-l] to xi[t] is specified according 
to (5) 

Wj = a * b = (1 - slip) * 

1 + guess 

The weights wi, W 2 , W3 must be normalized because 
sum of them is larger than 1, wi + 102 + 103 >1 

wi = wi * (1-wi), W3 = W3 * ( 1-wi ) (6) 

Suppose S is the sum of ivi, 102 and W3, we have: 

S = WI + 102 *(1-Wl) + W3 *(1-Wl) = Wl + (W2+W3)(l-Wl) 

= Wl + ( 1-Wl ) = 1. 

Expending (6) on general cases, suppose variable x,[t] 
has k-1 weights wa, wa,..., xik corresponding to k-1 
parents and a transition weight wa of temporal 
relationship between xi[t-l] and xi[t]. We have: 

Wi 2 =wa*(l-wa), wa=wa*(l-wa),..., Wik=unk*(l-um) <7> 

After normalizing weights following formula (7), 
transition weight wa is kept intact but other weights Wij 
(j > 1) get smaller. So the meaning of formula (7) is to 
focus on transition probability and knowledge 
accumulation. Because this formula is a suggestion, 
you can define the other one by yourself. 

_ A 5.8 

0.252 

0 . 

0.5$' ' 






FIG. 7 AUGMENTED DBN WHOSE WEIGHTS ARE 
NORMALIZED 

Let W;[f] be the set of weights relevant to a variable 
Xi[t], we have: 

Wi[t ] = {wa, wa, wa , . . ., Wik } where wa + wa +. . . + Wik = 1 


TABLE 1 THE WEIGHTS RELATING Xi[T] ARE NORMALIZED 



Wll 

1012 

1013 

Xl[t] 

0.58 

0.6 

0.4 

xi[t] ( normalized ) 

0.58 

0.252 

0.168 


Figure 7 shows the variant of augmented DBN (in 
figure 6) whose weights are normalized 


Step 5: Re-defining CPT(s) 

There are two random vectors X[f— 2] and X[t]. So 
defining CPT(s) of DBN includes: determining CPT for 
each variable Xi[t-1] e X[f— 2] and re-defining CPT for 
each variable xi[t] eX[f]. 


1. Determining CPT(s) of X[t-1], The CPT of xi[t-l] is 
the posterior probabilities which were computed in 
step 6 of previous iteration. 


Pr(x ; [f — 1] I E[t - 1]) = 


XPr(x,[r-l],x 2 [/-l] 

X/ {XjUE} 


X Pr(x, [4 — 1] , x 2 [/ — 1] , 


(see step 6) 


TABLE 2 CPT OF Xi[T-l] 


,x,[t- 1 ]) 
X.[t- 1 ]) 


Pr(xi[t-l]=l) 

Pr(xi[t-1]=0) 

on: the posterior probability ofxi 
computed at previous iteration 

1 - ai 

TABLE 3 CPT OF X2[T-1] 

Pr(x 2 [t-l]=l) 

Pr(x 2 [t-1]=0) 

cu: the posterior probability ofxi 
computed at previous iteration 

1 - «2 

TABLE 4 CPT OF X3[T-1] 

Pr(x3[t-l]=l) 

Pr(x3[t-1]=0) 

a3: the posterior probability ofxs 
computed at previous iteration 

1 - <X3 


2. Re-defining CPT(s) ofX[t]. Suppose pa{t] = {yi, X 2 ,..., 
Xk] is a set of parents of Xi[t] at time point t and Wi[t] 
= {ion, wa,..., Wik } is a set of weights which expresses 
the strength of dependencies between x; and such 
pai[t]. Note that W;[f] is specified in step 4. The 
conditional probability of variable Xi[t] given its 
parents pa,[t] is denoted Pr(xi[t] I pat[t]). So Pr(xt[t] I 
pai[t]) represents the CPT of xjt]. 


Pr(x : [t] = ]\pa i [t]) = Y,w ij *h v 

i - 1 


where h ld = 


lif y, =t, W = 1 

0 otherwise 


Pr(xi[t]=0 I pai[t]) = l-Pr(xi[t]=l I pai[t]> 
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TABLE 5 CPT OF Xi[T] 



X2[t] 

X3[t] 

Pr(xi[t]=l) 

Pr(xi[t]=0) 

1 

1 

1 

1.0 ( 0.58*1+0.252*1+0.168*1 ) 

0.0 

1 

1 

0 

0.832 ( 0.58*1+0.252*1+0.168*0 ) 

0.168 

1 

0 

1 

0.748 ( 0.58*1+0.252*0+0.168*1 ) 

0.252 

1 

0 

0 

0.58 ( 0.58*1+0.252*0+0.168*0 ) 

0.42 

0 

1 

1 

0.42 (0.58*0+0.252*1+0.168*1) 

0.58 

0 

1 

0 

0.252 (0.58*0+0.252*1+0.168*0) 

0.748 

0 

0 

1 

0.168 (0.58*0+0.252*0+0.168*1) 

0.832 

0 

0 

0 

0.0 (0.58*0+0.252*0+0.168*0) 

1.0 


TABLE 6 CPT OF X2[T] 


X2[t-1] 

Pr(x 2 [t]=l) 

Pr(x 2 [t]=0) 

1 

0.58 ( 0.58*1 ) 

0.42 

0 

0.0 (0.58*0) 

1.0 


TABLE 7 CPT OF X3[T] 


X3[t-1] 

Pr(x3[t]=l) 

Pr(x3[t]=0) 

1 

0.58 (0.58*1) 

0.42 

0 

0.0 (0.58*0) 

1.0 


TABLE 8 CPT OF El [T ] 


Pr(ei[t]=l) 

Pr(ei[t]=0) 

0.5 

(use uniform distribution) 

0.5 

(use uniform distribution) 


CPTofx,[t-l] 



CPTofx 2 [t-l] 



CPT ofxs[t-l] 



CPT of x At ] 



FIG. 8 AUGMENTED DBN AND ITS CPT (s) 

Step 6: Probabilistic Inference 

The probabilistic inference in our augmented DBN can 
be done similarly to normal Bayesian network by 
using the formula in (3). It is essential to compute the 
posterior probabilities of non-evidence variable in X[f]. 


This decrease significantly expense of computation 
regardless of a large number of variables in DBN for a 
long time. At any time point, it is only to examine 2*n 
variables if the DAG has n variables instead of 
including 2 *n *t variables and n*n*t transition 
probabilities given time point t. Each posterior 
probability of x,[t] e X[f] is computed below. 


Pr(xi[t])=Pr(x i [t] \ E[t]) = 


lPr(x,M,x 2 M 

X/ {Xj uE) 




,*„M) 

T [/]) 


where £[f] is a set of evidences occurring at time point f. 


Such posterior probabilities are also used for determining 
CPT(s) of DBN in step 5 of next iteration. For example, 
posterior probabilities of xi[t], X 2 [f] and X3[t] are an, on 
and as respectively. Note that it is not required to 
compute the posterior probabilities of X[f — 2]. If the 
posterior probabilities are the same as before (previous 
iteration) then DBN converges when all posterior 
probabilities of variables Xi[t] gain stable values at any 
time. If so we can stop algorithm; otherwise turning 
back step 1. 

TABLE 9 THE RESULTS OF PROBABILISTIC INFERENCE 


Pr(xi[t]) 

ai 

Pr(xi[t]) 

ai 

Pr(x 3 [t]) 

OL3 


Posterior probabilities are used for determining CPT(s) 
of DBN in step 5 of next iteration. 


Conclusions 

Our basic idea is to minimize the size of DBN and the 
number of transition probabilities in order to decrease 
expense of computation when the process of inference 
continues for a long time. Suppose DBN is stationary 
and has Markov property, we define two factors: slip 
& guess to specify the same weight for all transition 
relationships (temporal relationship) among time 
points instead of specify a large number of transition 
probabilities. The augmented DBN composed at given 
time point t has just two random vectors X[f— 2] and 
X[f]; so , it is only to examine 2*n variables if the DAG 
has n variables instead of including 2*n*t variables and 
n*n*t transition probabilities. That specifying slip 
factor and guess factor will solve the problem of 
temporary slip and lucky guess. 

The process of inference including six steps is done in 
succession through many iterations, the result of 
current iteration will be input for next iteration. After 
t ,h iteration DBN will converge when the posterior 
probabilities of all variables x,[t] gain stable values 
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regardless of the occurrence of a variety evidences. 
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